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* ABSTRACT 



Hardware-assisted real-time garbage collection offers high throughput and small worst-case bounds 
on the times required to allocate dynamic objects and to access the memory contained within 
previously allocated objects. Whether the proposed technology is cost effective depends on various 
choices between configuration alternatives. This paper reports the performance of several different 
configurations of the hardware-assisted real-time garbage collection system subjected to several 
different workloads. Reported measurements demonstrate that hardware-assisted real-time garbage 
collection is a viable alternative to traditional explicit memory management techniques, even for low- 
level languages like C++. 
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State of the art Java Virtual Machines with Just-In-Time (JIT) compilers make use of 
advanced compiler techniques, run-time profiling and adaptive compilation to improve 
performance. However, these techniques for alleviating performance bottlenecks are more 
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5 Using complete system simulation to characterize SPECjvm98 benchmarks 
Tao Li, Lizy Kurian John, Vijaykrishnan Narayanan, Anand Sivasubramaniam, Jyotsna 
Sabarinathan, Anupama Murthy 

May 2000 Proceedings of the 14th international conference on Supercomputing 

Full text available: fB pdf(1 66 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

Complete system simulation to understand the influence of architecture and operating 
systems on application execution has been identified to be crucial for systems design. While 
there have been previous attempts at understanding the architectural impact of Java 
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This study compares the speed, area, and power of different implementations of Active 
Pages [OCS98], an intelligent memory system which helps bridge the growing gap between 
processor and memory performance by associating simple functions with each page of data. 
Previous investigations have shown up to 1000X speedups using a block of reconfigurable 
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Our new out-of-order processor simulatol; FastSim, uses two innovations to speed up 
simulation 8-15 times (vs. Wisconsin SimpleScalar) with no loss in simulation accuracy. 
First, FastSim uses speculative direct-execution to accelerate the functional emulation of 
speculatively executed program code. Second, it uses a variation on memoization— a well- 
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Hardware multithreading is becoming a generally applied technique in the next generation of 
microprocessors. Several multithreaded processors are announced by industry or already 
into production in the areas of high-performance microprocessors, media, and network 
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Hardware-assisted real-time garbage collection offers high throughput and small worst-case 
bounds on the times required to allocate dynamic objects and to access the memory 
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effective depends on various choices between configuration alternatives. This paper reports 
the performance of several different configurations of the hardware-assisted real-time 
garbage collection system subjected to several different ... 
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It is well-known that multiprocessor systems are vastly more difficult to program than 
systems that support sequential programming models. In a 1998 paper[ll] this author 
argued that six important principles for supporting modular software construction are often 
violated by the architectures proposed for multiprocessor computer systems. The Fresh 
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Software prefetching is a promising technique to hide cache miss latencies, but it remains 
challenging to effectively prefetch pointer-based data structures because obtaining the 
memory address to be prefetched requires pointer dereferences. The recently proposed 
stride prefetching overcomes this problem, but it only exploits inter-iteration stride patterns 
and relies on an off-line profiling method. We propose a new algorithm for stride prefetching 
which is intended for use in a dynamic ... 
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caches by actively creating spatial locality, facilitating prefetching, and avoiding cache 
conflicts and false sharing. Unfortunately, it is extremely difficult to guarantee that such 
optimizations are safe in practice on today's machines, since accurately updating all pointers 
to an object requires perfect alias information, which is well beyond the scope of the 
compiler for languages such as C. T ... 
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Speculative execution is an important source of parallelism for VLIW and superscalar 
processors. A serious challenge with compiler-controlled speculative execution is to 
accurately detect and report all program execution errors at the time of occurrence. In this 
paper, a set of architectural features and compile-time scheduling support referred to as 
sentinel scheduling is introduced. Sentinel scheduling provides an effective framework for 
compiler-controlled speculative ex ... 
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Java workloads are becoming more and more prominent on various computing devices. 
Understanding the behavior of a Java workload which includes the interaction between the 
application and the virtual machine (VM), is thus of primary importance during performance 
analysis and optimization. Moreover, as contemporary software projects are increasing in 
complexity, automatic performance analysis techniques are indispensable. This paper 
proposes an off-line method-level phase analysis approach for ... 
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Register integration (or just integration) is a register renaming discipline that implements 
instruction reuse via physical register sharing. Initially developed to perform squash reuse, 
the integration mechanism can exploit more reuse scenarios. Here, we describe three 
extensions to the original design that expand its applicability and boost its performance 
impact. First, we extend squash reuse to general reuse. Whereas squash reuse maintains 
the concept of an instruction instance "owning" its ... 
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Common hardware exceptions, when implemented by trapping, unnecessarily serialize 
program execution in dynamically scheduled superscalar processors. To avoid the 
consequences of trapping the main program thread, multithreaded CPUs can exploit control 
and data independence by executing the exception handler in a separate hardware context. 
The main thread doesn't squash instructions after the excepting instruction, conserving fetch 
bandwidth and allowing execution of instructions inde ... 
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The Intel® Itanium® architecture contains a number of innovative compiler-controllable 
features designed to exploit instruction level parallelism. New code generation and 
optimization techniques are critical to the application of these features to improve processor 
performance. For instance, the Itanium® architecture provides a compiler-controllable 
virtual register stack to reduce the penalty of memory accesses associated with procedure 
calls. The Itanium® Register Stack Engine ... 
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